智能论文笔记

Optimal Diagonal Preconditioning: Theory and Practice

Zhaonan Qu , Wenzhi Gao , Oliver Hinder , Yinyu Ye , Zhengyuan Zhou

分类：机器学习 | (统计)机器学习

2022-09-02

预处理一直是优化和机器学习方面的主食技术。它通常会减少其应用于矩阵的条件数，从而加快优化算法的收敛性。尽管实践中有许多流行的预处理技术，但大多数人缺乏降低病数的理论保证。在本文中，我们研究了最佳对角线预处理的问题，以分别或同时分别或同时缩放其行或列来实现任何全级矩阵的条件数量的最大降低。我们首先将问题重新将问题重新制定为一个准凸出问题，并提供了一种基线一分配算法，该算法在实践中易于实现，其中每次迭代都包含SDP可行性问题。然后，我们建议使用$ o（\ log（\ frac {1} {\ epsilon}）））$迭代复杂度提出多项式时间潜在的降低算法，其中每个迭代均由基于Nesterov-todd方向的牛顿更新组成。我们的算法基于该问题的表述，该问题是von Neumann最佳生长问题的广义版本。接下来，我们专注于单方面的最佳对角线预处理问题，并证明它们可以作为标准双SDP问题配方，我们应用了有效的定制求解器并研究我们最佳的对角线预处理的经验性能。我们在大型矩阵上进行的广泛实验表明，与基于启发式的预处理相比，最佳对角线预处理在减少条件数方面的实际吸引力。

translated by 谷歌翻译

A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization

Jian Cao , Chen Qian , Yihui Huang , Dicheng Chen , Yuncheng Gao , Jiyang Dong , Di Guo , Xiaobo Qu

分类：机器学习

2022-12-29

Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.

translated by 谷歌翻译

Principled and Efficient Transfer Learning of Deep Models via Neural Collapse

Xiao Li , Sheng Liu , Jinxin Zhou , Xinyu Lu , Carlos Fernandez-Granda , Zhihui Zhu , Qing Qu

分类：机器学习 | 人工智能 | 计算机视觉 | (统计)机器学习

2022-12-23

With the ever-growing model size and the limited availability of labeled training data, transfer learning has become an increasingly popular approach in many science and engineering domains. For classification problems, this work delves into the mystery of transfer learning through an intriguing phenomenon termed neural collapse (NC), where the last-layer features and classifiers of learned deep networks satisfy: (i) the within-class variability of the features collapses to zero, and (ii) the between-class feature means are maximally and equally separated. Through the lens of NC, our findings for transfer learning are the following: (i) when pre-training models, preventing intra-class variability collapse (to a certain extent) better preserves the intrinsic structures of the input data, so that it leads to better model transferability; (ii) when fine-tuning models on downstream tasks, obtaining features with more NC on downstream data results in better test accuracy on the given task. The above results not only demystify many widely used heuristics in model pre-training (e.g., data augmentation, projection head, self-supervised learning), but also leads to more efficient and principled fine-tuning method on downstream tasks that we demonstrate through extensive experimental results.

translated by 谷歌翻译

When Federated Learning Meets Pre-trained Language Models' Parameter-Efficient Tuning Methods

Zhuo Zhang , Yuanhang Yang , Yong Dai , Lizhen Qu , Zenglin Xu

分类：机器学习 | 自然语言处理

2022-12-20

With increasing privacy concerns on data, recent studies have made significant progress using federated learning (FL) on privacy-sensitive natural language processing (NLP) tasks. Much literature suggests fully fine-tuning pre-trained language models (PLMs) in the FL paradigm can mitigate the data heterogeneity problem and close the performance gap with centralized training. However, large PLMs bring the curse of prohibitive communication overhead and local model adaptation costs for the FL system. To this end, we introduce various parameter-efficient tuning (PETuning) methods into federated learning. Specifically, we provide a holistic empirical study of representative PLMs tuning methods in FL. The experimental results cover the analysis of data heterogeneity levels, data scales, and different FL scenarios. Overall communication overhead can be significantly reduced by locally tuning and globally aggregating lightweight model parameters while maintaining acceptable performance in various FL settings. To facilitate the research of PETuning in FL, we also develop a federated tuning framework FedPETuning, which allows practitioners to exploit different PETuning methods under the FL training paradigm conveniently. The source code is available at \url{https://github.com/iezhuozhuo/FedETuning/tree/deltaTuning}.

translated by 谷歌翻译

Let's Negotiate! A Survey of Negotiation Dialogue Systems

Haolan Zhan , Yufei Wang , Tao Feng , Yuncheng Hua , Suraj Sharma , Zhuang Li , Lizhen Qu , Gholamreza Haffari

分类：自然语言处理

2022-12-18

Negotiation is one of the crucial abilities in human communication, and there has been a resurgent research interest in negotiation dialogue systems recently, which goal is to empower intelligent agents with such ability that can efficiently help humans resolve conflicts or reach beneficial agreements. Although there have been many explorations in negotiation dialogue systems, a systematic review of this task has to date remained notably absent. To this end, we aim to fill this gap by reviewing contemporary studies in the emerging field of negotiation dialogue systems, covering benchmarks, evaluations, and methodologies. Furthermore, we also discuss potential future directions, including multi-modal, multi-party, and cross-cultural negotiation scenarios. Our goal is to provide the community with a systematic overview of negotiation dialogue systems and to inspire future research.

translated by 谷歌翻译

Modeling Global Distribution for Federated Learning with Label Distribution Skew

Tao Sheng , Chengchao Shen , Yuan Liu , Yeyu Ou , Zhe Qu , Jianxin Wang

分类：机器学习 | 计算机视觉

2022-12-17

Federated learning achieves joint training of deep models by connecting decentralized data sources, which can significantly mitigate the risk of privacy leakage. However, in a more general case, the distributions of labels among clients are different, called ``label distribution skew''. Directly applying conventional federated learning without consideration of label distribution skew issue significantly hurts the performance of the global model. To this end, we propose a novel federated learning method, named FedMGD, to alleviate the performance degradation caused by the label distribution skew issue. It introduces a global Generative Adversarial Network to model the global data distribution without access to local datasets, so the global model can be trained using the global information of data distribution without privacy leakage. The experimental results demonstrate that our proposed method significantly outperforms the state-of-the-art on several public benchmarks. Code is available at \url{https://github.com/Sheng-T/FedMGD}.

translated by 谷歌翻译

Disentangling Prosody Representations with Unsupervised Speech Reconstruction

Leyuan Qu , Taihao Li , Cornelius Weber , Theresa Pekarek-Rosin , Fuji Ren , Stefan Wermter

分类：自然语言处理

2022-12-14

Human speech can be characterized by different components, including semantic content, speaker identity and prosodic information. Significant progress has been made in disentangling representations for semantic content and speaker identity in Automatic Speech Recognition (ASR) and speaker verification tasks respectively. However, it is still an open challenging research question to extract prosodic information because of the intrinsic association of different attributes, such as timbre and rhythm, and because of the need for unsupervised training schemes to achieve robust large-scale and speaker-independent ASR. The aim of this paper is to address the disentanglement of emotional prosody from speech based on unsupervised reconstruction. Specifically, we identify, design, implement and integrate three crucial components in our proposed speech reconstruction model Prosody2Vec: (1) a unit encoder that transforms speech signals into discrete units for semantic content, (2) a pretrained speaker verification model to generate speaker identity embeddings, and (3) a trainable prosody encoder to learn prosody representations. We first pretrain the Prosody2Vec representations on unlabelled emotional speech corpora, then fine-tune the model on specific datasets to perform Speech Emotion Recognition (SER) and Emotional Voice Conversion (EVC) tasks. Both objective and subjective evaluations on the EVC task suggest that Prosody2Vec effectively captures general prosodic features that can be smoothly transferred to other emotional speech. In addition, our SER experiments on the IEMOCAP dataset reveal that the prosody features learned by Prosody2Vec are complementary and beneficial for the performance of widely used speech pretraining models and surpass the state-of-the-art methods when combining Prosody2Vec with HuBERT representations. Some audio samples can be found on our demo website.

translated by 谷歌翻译

On the Evolution of (Hateful) Memes by Means of Multimodal Contrastive Learning

Yiting Qu , Xinlei He , Shannon Pierson , Michael Backes , Yang Zhang , Savvas Zannettou

分类：机器学习

2022-12-13

The dissemination of hateful memes online has adverse effects on social media platforms and the real world. Detecting hateful memes is challenging, one of the reasons being the evolutionary nature of memes; new hateful memes can emerge by fusing hateful connotations with other cultural ideas or symbols. In this paper, we propose a framework that leverages multimodal contrastive learning models, in particular OpenAI's CLIP, to identify targets of hateful content and systematically investigate the evolution of hateful memes. We find that semantic regularities exist in CLIP-generated embeddings that describe semantic relationships within the same modality (images) or across modalities (images and text). Leveraging this property, we study how hateful memes are created by combining visual elements from multiple images or fusing textual information with a hateful image. We demonstrate the capabilities of our framework for analyzing the evolution of hateful memes by focusing on antisemitic memes, particularly the Happy Merchant meme. Using our framework on a dataset extracted from 4chan, we find 3.3K variants of the Happy Merchant meme, with some linked to specific countries, persons, or organizations. We envision that our framework can be used to aid human moderators by flagging new variants of hateful memes so that moderators can manually verify them and mitigate the problem of hateful content online.

translated by 谷歌翻译

Distantly-Supervised Named Entity Recognition with Adaptive Teacher Learning and Fine-grained Student Ensemble

Xiaoye Qu , Jun Zeng , Daizong Liu , Zhefeng Wang , Baoxing Huai , Pan Zhou

分类：自然语言处理

2022-12-13

Distantly-Supervised Named Entity Recognition (DS-NER) effectively alleviates the data scarcity problem in NER by automatically generating training samples. Unfortunately, the distant supervision may induce noisy labels, thus undermining the robustness of the learned models and restricting the practical application. To relieve this problem, recent works adopt self-training teacher-student frameworks to gradually refine the training labels and improve the generalization ability of NER models. However, we argue that the performance of the current self-training frameworks for DS-NER is severely underestimated by their plain designs, including both inadequate student learning and coarse-grained teacher updating. Therefore, in this paper, we make the first attempt to alleviate these issues by proposing: (1) adaptive teacher learning comprised of joint training of two teacher-student networks and considering both consistent and inconsistent predictions between two teachers, thus promoting comprehensive student learning. (2) fine-grained student ensemble that updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise. To verify the effectiveness of our proposed method, we conduct experiments on four DS-NER datasets. The experimental results demonstrate that our method significantly surpasses previous SOTA methods.

translated by 谷歌翻译

MegaCRN: Meta-Graph Convolutional Recurrent Network for Spatio-Temporal Modeling

Renhe Jiang , Zhaonan Wang , Jiawei Yong , Puneet Jeph , Quanjun Chen , Yasumasa Kobayashi , Xuan Song , Toyotaro Suzumura , Shintaro Fukushima

分类：机器学习 | 人工智能

2022-12-12

Spatio-temporal modeling as a canonical task of multivariate time series forecasting has been a significant research topic in AI community. To address the underlying heterogeneity and non-stationarity implied in the graph streams, in this study, we propose Spatio-Temporal Meta-Graph Learning as a novel Graph Structure Learning mechanism on spatio-temporal data. Specifically, we implement this idea into Meta-Graph Convolutional Recurrent Network (MegaCRN) by plugging the Meta-Graph Learner powered by a Meta-Node Bank into GCRN encoder-decoder. We conduct a comprehensive evaluation on two benchmark datasets (METR-LA and PEMS-BAY) and a large-scale spatio-temporal dataset that contains a variaty of non-stationary phenomena. Our model outperformed the state-of-the-arts to a large degree on all three datasets (over 27% MAE and 34% RMSE). Besides, through a series of qualitative evaluations, we demonstrate that our model can explicitly disentangle locations and time slots with different patterns and be robustly adaptive to different anomalous situations. Codes and datasets are available at https://github.com/deepkashiwa20/MegaCRN.

translated by 谷歌翻译